Uni- and bivariate data
transformations using R
 

SIMP59: Data Selection and Visualisation
7.5 credits VT25

nils.holmberg@iko.lu.se

Canvas info

This lecture introduces key concepts in data analysis using RMarkdown notebooks, focusing on working with data structures such as tables, networks, and nested data. Participants will learn how to import data frames, filter rows, and select relevant columns to refine their datasets. The session will cover handling missing values and identifying outliers to ensure data quality. We will explore the dplyr package, using the pipe operator to streamline data transformations, and discuss the principles of tidy data for efficient analysis and visualization.

We will also explore how to structure data analysis around research questions and variables, ensuring a clear focus on meaningful insights. We will introduce grouping and aggregation techniques in dplyr to summarize data effectively, allowing for comparisons across different categories. Participants will also learn how to reshape data by lengthening and widening formats to better align with analytical needs. The session will cover methods for exporting cleaned and processed data frames for further use.

Course literature

Wickham, Çetinkaya-Rundel, and Grolemund (2023)

Wilke (2019)

Watt and Naidoo (2025)

Lecture overview

  • rmarkdown notebooks, data normalization
  • data structures (tables, networks, unnesting)
  • 7.2 Import dataframes
  • 3.2 Rows (filtering)
  • 3.3 Columns (selecting)
  • missing values, outliers
  • filter and select data
  • 3.4 The pipe (dplyr)
  • 5.2 Tidy data
  • research questions, variables
  • 3.5 Groups
  • 3.6 Aggregates
  • 5.3 Lengthening data
  • 5.4 Widening data
  • summarizing data
  • export dataframe
  • 1.4 Visualizing data

whole game

A diagram displaying the data science cycle: Import -> Tidy -> Understand  (which has the phases Transform -> Visualize -> Model in a cycle) -> Communicate. Surrounding all of these is Program Import, Tidy, Transform, and Visualize is highlighted.

Figure 1: In this section of the book, you’ll learn how to import, tidy, transform, and visualize data.

rmarkdown, scripts

import

20 Spreadsheets 21 Databases 22 Arrow 23 Hierarchical data

transform

12 Logical vectors 13 Numbers 14 Strings 15 Regular expressions 16 Factors 17 Dates and times 18 Missing values 19 Joins

figure, pivot

A diagram showing how `pivot_longer()` transforms a simple data set, using color to highlight how column names ("bp1" and "bp2") become the values in a new `measurement` column. They are repeated three times because there were three rows in the input.

Figure 2: The column names of pivoted columns become values in a new column. The values need to be repeated once for each row of the original dataset.

Palmer Penguins

test

Quantitative methods

    1. Experiments and
      Threats to Validity
    1. Survey Research,
      Questionnaire
    1. Quantitative
      Content Analysis

Lectures and workshops

Data collection (nov 12)

    1. Concept Explication and Measurement
    1. Reliability and Validity
    1. Effective ­Measurement
    1. Sampling
    1. Content Analysis

Exam question 1

Data analysis (nov 26)

    1. Experiments and Threats to Validity
    1. Survey Research
    1. Descriptive Statistics
    1. Inferential Statistics
    1. Multivariate Statistics

Exam question 2

9. Experiments and Threats to Validity

  • Random Assignment (p. 225)
  • Between-Subjects Design (p. 227)
  • Within-Subjects Design (p. 228)
  • Treatment Groups (p. 233)
  • Stimulus (p. 233)
  • Control Group (p. 238)

Next steps

Workshop 2, dec 2

References

Watt, H., and T. Naidoo. 2025. “Data Wrangling Recipes in r.” https://bookdown.org/hcwatt99/Data_Wrangling_Recipes_in_R/#why-data-wrangling-recipes-in-r.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science. 2nd ed. "O’Reilly Media, Inc.". https://r4ds.hadley.nz/.
Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media. https://clauswilke.com/dataviz/.